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Within blended learning environments, the availability and 
analysis of student data has emerged as a central issue. For 
struggling students, data generated by digital learning sys¬ 
tems present new opportunities to investigate critical success 
factors. In reality, many seemingly basic questions about per¬ 
sistence, progress, and performance of these learners in on¬ 
line environments may not be readily answered using extant 
data. This research inquiry generated insight into the practi¬ 
cal challenges related to data identification, acquisition, and 
analysis that are faced by stakeholders seeking to assess the 
impact of online learning environments on student outcomes. 
The intent was to investigate the feasibility of and identify 
barriers to creating a unified student record—that is, a single 
data set combining data on student demographics, activity, 
and academic performance drawn from different information 
systems—that might be used to identify salient factors asso¬ 
ciated with more or less successful student outcomes. This 
two-phase investigation highlighted key barriers that are im¬ 
portant to address in future research efforts and engendered a 
series of recommendations that could eliminate some of the 
current challenges to data acquisition and analysis. 
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DISCONNECTED DATA: THE CHALLENGE OF MATCHING ACTIVITIES 
TO OUTCOMES IN ONLINE LEARNING 

The Research Center and Its Focus 

The Center on Online Learning and Students with Disabilities (COLSD), 
funded by the Office of Special Education Programs, United States Depart¬ 
ment of Education, conducts research on how K-12 online learning impacts 
the access, participation, and progress of students with disabilities in online 
settings (Basham, Smith, Greer, & Marino, 2013; Basham, Stahl, Ortiz, 
Rice, & Smith, 2015; Harvey, Greer, Basham, & Hu, 2014). These settings 
include full-time virtual schools, blended classrooms that combine online 
activities with attendance in brick-and-mortar classrooms, and supplemental 
courses for credit recovery or unique course enrollments. COLSD research 
focuses on the design, selection, and implementation of digital curriculum 
materials; the systems that deliver them; and the instructional practices as¬ 
sociated with their use in order to increase efficacy for these students and 
other elementary and secondary learners participating in online learning en¬ 
vironments. 

In order to conduct ecologically valid and useful research in online in¬ 
struction, COLSD has established multiple research to practice partnerships 
with school districts and large-scale online vendors who are presently en¬ 
gaged in online or blended instruction. The goal of these research partner¬ 
ships is to identify the range of policies and practices—both promising and 
problematic—that presently exist in online learning and to conduct collab¬ 
orative research that would guide future policies and practices to optimize 
student outcomes. 

The research described here was conducted in collaboration with leaders 
from two of COLSD’s Research to Practice partnerships: a non-profit online 
school (NE11) offering supplemental high school courses and a for-profit 
educational technology and learning management system provider (NE2). 
NE1 and NE2 are already partnering with each other to provide services to 
diverse students and COLSD essentially acts as a third partner—a research 
partner—that brings additional capacity and expertise to conduct research 
that can benefit NE1 and NE2 as well as other providers like them in devel¬ 
oping optimal services for students with disabilities and their peers. 

Reports indicate that nearly every school district in the United States 
offers some form of blended or online learning and that hundreds of thou¬ 
sands of students are enrolled in full-time virtual schools (Barbour, 2013; 
Gemin, Pape, Vashaw, & Watson, 2015). COLSD research efforts target¬ 
ing the approximately 6 percent of students with disabilities (Molnar et al.. 


1 Pseudonyms are used for the two providers per the confidentiality terms of the research 
agreement. 
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2014) enrolled within these various online settings have attended to three 
interdependent factors: the students, the digital systems they engage with, 
and the learning context. 

The research initiative that prompted this paper was designed to ad¬ 
dress these three factors via large-scale data collection in collaboration with 
NE1 and NE2. In reality, the data analyses fundamental to the research de¬ 
sign proved too formidable to accommodate within the constraints of this 
particular project for reasons difficult to anticipate beforehand, where the 
challenges included missing data, inaccessible data, and uninterpretable 
data. While COLSD’s charge was specific to students with disabilities, its 
researchers believe the challenges encountered are both common and gen- 
eralizable across many existing elementary and secondary online learning 
systems and applicable to all students and that identifying them and offering 
some suggested approaches to eliminating them can inform future research 
efforts. 

The Research Environment 

In 2013-2014, NE1 (the online school) enrolled more than 15,000 stu¬ 
dents from 740 participating brick-and-mortar high schools in 200 second¬ 
ary-level courses. From data provided by NE1, 200 of these students were 
reported as having Individualized Education Plans (IEPs), which may well 
be an under-reporting of the actual number of students with IEPs enrolled. 
As these exceptional students remind us, there is great variability among 
all learners; thus, instruction and curriculum for students should not follow 
a single prescribed trajectory through any online course (Meyer, Rose, & 
Gordon, 2014; Rose, 2016). Instead of asking “what works best?” in some 
global sense, it is important to consider the more nuanced research question, 
“what works best, for whom, and under what conditions?” A review of the 
literature and COLSD research indicates, however, that analysis of system 
usage data for informing educational practice in K-12 online learning is not 
yet the norm (Burdette, Greer, & Woods, 2013; Kim et al., 2015). 

Research evidence substantiates that the analysis of large student data 
sets can yield correlations containing high predictive capabilities that 
are otherwise unavailable (Baker, 2010; Bienkowski, Feng, & Means, 
2012; Markauskaite, 2011; Macfadyen & Dawson, 2010; Reshef, Reshef, 
Finucane, Grossman, McVean, Tumbaugh, & Sabeti, 2011). When data sets 
detailing student use of online learning environments are combined with de¬ 
mographic and student achievement data, in particular, profiles are identi¬ 
fied that can be associated with a high degree of accuracy to learning path¬ 
ways and decision-making (Davies & Graff, 2005; Zorrilla, Garcia, & Al¬ 
varez; 2010). This, in turn, can expand the knowledge base for educators in 
two areas: 1) the identification of students on failure trajectories, and 2) the 
efficacy of targeted interventions designed to guide students towards more 
positive outcomes (Steiner, Flamilton, Peet, & Pane, 2015). 
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In online learning environments, therefore, the integration of three sets 
of student data—(a) demographic information about an individual student 
(such as age, disability status, and disability impact), (b) system usage (e.g., 
online activities, duration) and (c) academic achievement (e.g., grades, for¬ 
mative and summative assessment data)—has the potential to create new 
opportunities for understanding student learning, behavior, and progress, 
as well as for providing more targeted interventions for diverse learners 
(Hung, Hsu, & Rice, 2012; National Forum on Education Statistics, 2015; 
U.S. Department of Education, 2013). In order to realize this opportunity, 
however, each of these data sets must be examined not in isolation but in 
relation to one another. It is the variety of data challenges encountered while 
trying to compile such a unified student record—such as inability to obtain 
usage data from vendors and student demographic data from LEAs—that 
form the basis for this study. 

In the NE1/NE2 inquiry specifically it was discovered that barriers ex¬ 
ist at various points in the process—for example, data do not exist, the data 
exist but cannot be accessed readily, or what data do exist cannot be made 
usable for assessing meaningful educational impact at a reasonable cost, if 
at all. These findings were surprising since the cost-effective collection of 
large amounts of detailed data on student behavior is a primary affordance 
of personalization in online learning environments (Martinez, 2002; Rome¬ 
ro & Ventura, 2010; Tanenbaum, Le Floch, & Boyle, 2013). In fact, there 
is evidence that while moving from offline to online learning environments 
yields much more operational data, existing challenges with accessing, shar¬ 
ing, and using data can in some cases severely limit the use of these data to 
inform instructional decisions for individual students or to inform improve¬ 
ments to the system as a whole to better serve particular groups of students 
like those with disabilities (Burdette, Franklin, East, & Mellard, 2015; Bur¬ 
dette, Greer, & Woods, 2013). 

Based on the evidence that a correlation analysis of the three student data 
sets referenced earlier (demographics, usage, and achievement) can facili¬ 
tate the identification of effective approaches to instruction by helping to 
identify factors associated with less than or greater than expected student 
achievement for specific subgroups of students, COLSD researchers initi¬ 
ated a two-phase approach to investigate what would be required and what 
barriers would need to be overcome in order to realize this opportunity. 

METHODS 

In Phase 1, COLSD researchers worked closely with the online school 
(NE1) and the school’s learning management system (LMS) provider (NE2) 
in an effort to integrate student demographic, usage, and achievement data 
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into a unified student record (USR) that could be analyzed to address re¬ 
search questions about students with disabilities (SWD) and their peers in 
online learning courses. An effort was made to collect and analyze data 
from three sources: quantitative student-level data extracted from the stu¬ 
dent information system (SIS) and LMS, interview data from key staff at 
NE1 and NE2, and examination of documents provided by the online school 
and its courses. This triangulation of data was used to explain the school’s 
operational model and practices and what opportunities and challenges 
these practices generate for students (Cohen & Manion, 2000). While the 
full data set supported a number of different analyses, the current study fo¬ 
cuses specifically on a set of challenges and barriers that resulted in relation 
to data collection, extraction, and analysis in an online environment. Addi¬ 
tional descriptive studies of NE1 and NE2, including their models for sup¬ 
porting diverse students, as well as an analysis of challenges arising at the 
intersection between them are the subjects of separate papers (Connell & 
Johnston, 2015; Johnston & Connell, 2015a; Johnston & Connell, 2015b). 

In Phase II, researchers again worked closely with NE1 and NE2 as well 
as with a third partner providing text-to-speech functionality (referred to 
here as audio-supported reading or ASR) to develop a technical specifica¬ 
tion for data collection and storage that would support the production of a 
USR. The intent was to implement that specification to produce the USR 
to determine if, how, and to what extent students used ASR support in an 
online course during an academic semester. By factoring ASR use into the 
analysis, researchers had hoped to identify the extent to which this support 
was associated with either greater than or less than anticipated academic 
achievement for different subgroups of students, although in the end, this 
analysis could not be carried out within the parameters of this research proj¬ 
ect, for reasons described in later sections. 

Participants 

COLSD researchers conducted face-to-face and written interviews with 
staff from both NE1 (the online school) and NE2 (the LMS provider). In¬ 
terviewees were identified through a purposive sampling strategy (Patton, 
2002). School staff at NE1 (n = 7), including senior managers and leaders 
across the functional areas of technology, instruction, IEP accommodations, 
and operations, were selected for participation. NE2 staff (n = 7) included 
leaders, managers, and individual contributors across the functional areas of 
technology, accessibility testing, instruction, research, analytics, marketing, 
and operations. 
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Design and Materials 

NE1 and NE2 staff both independently guided researchers through one 
of their online courses to provide context around a typical student experi¬ 
ence. Then each participant completed a semi-structured interview (Max¬ 
well, 1996). This approach was selected to provide researchers with re¬ 
sponses to a common set of questions while simultaneously allowing for 
more expansive reflections that coidd reveal further differences and simi¬ 
larities in how these two entities perceived and addressed online course de¬ 
sign. Researchers also were granted access to the NE1 school course envi¬ 
ronment and NE2’s LMS and examined features within the school’s course 
environment designed to support student information access (e.g., text-to- 
speech). 

A grounded theory approach was used (Glaser & Strauss, 1967) to ana¬ 
lyze the interview data. Transcripts of all interviews were reviewed and in¬ 
dependently coded by two independent reviewers. Codes for each interview 
were created during analysis of the transcript with each reviewer participat¬ 
ing in the coding. Codes were used to summarize key areas discussed by 
the participants (Charmaz, 2006). Researchers worked through successive 
stages of coding and analysis, from the open codes and categories provided 
in the first stages to successively refined themes, to generate theoretical con¬ 
cepts and insights related to the research questions. Through this analytic 
process, the researchers identified and selected three theoretical concepts to 
analyze in depth: 

• Task Structure and Competencies: The structure of tasks a student must 
engage in to learn in NE1 and the competencies they need to do so 

• Role Structure: The structure of roles to support student learning in 
NE1 

• Data Flows: The flow of data through the system from collection to 
interpretation to application in support of student learning 

This study focuses on the third item—data flows—while the other two 
concepts are addressed elsewhere (Connell & Johnston, 2015; Johnston & 
Connell, 2015a; Johnston & Connell, 2015b). In Phase I of this initiative 
investigating data flows, the goal was to conduct a retrospective analysis 
of existing data—that is, student data that had been collected by the on¬ 
line school and the LMS provider in prior years. As such, the first step was 
to identify what kinds of questions might be answered given the available 
data. Interviews included questions about information and data systems; 
processes for collecting, storing, and using student data; and known barri¬ 
ers to using data effectively to support individual student learning. This, to¬ 
gether with review of supporting documentation such as web pages where 
student data are collected, database specifications, and data dashboards and 
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reports available to teachers, enabled the compilation of an inventory of 
key information systems and relevant data either stored in them or flowing 
through them (Figure 1). Based on this inventory, a wish list of demograph¬ 
ic variables was compiled and a minimal set of demographic, usage, and 
achievement variables was chosen from among those that were confirmed 
to be used or logged somewhere in the system (please see Appendix A for a 
list of these variables and measures). The idea was to demonstrate feasibil¬ 
ity with a minimal set of measures initially and then expand the scope to in¬ 
clude more measures. In particular, the initial effort focused on compiling a 
USR containing student status (IEP, non-IEP), basic measures of student us¬ 
age of the online platform, and course grades—all of which were data either 
stored in or used by the system. Researchers worked with the online school 
and the LMS provider to extract these data. 
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Figure 1 . Schematic of secondary online learning system. 

Researchers discovered that because of the data’s importance and sen¬ 
sitivity, as well as the relatively small number of students on IEPs each 
year, the online school processed the IEP information manually outside of 
the system and no record of this was stored in the information system in a 
structured way. They did, however, store the course grades in a database in a 
structured form. The LMS provider provided a set of database files contain¬ 
ing raw data on student usage of the platform. 
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Phase II was intended to be a prospective research study, involving NE1, 
NE2, and a new partner (a third-party provider of online ASR support inte¬ 
grated into NEl’s classes delivered via NE2’s LMS). Whereas Phase I was 
retrospective, Phase II started with some specific research questions and, 
based on those questions, the partners identified what data to collect and 
what technical changes to make to the platforms to support that data collec¬ 
tion. Again, the first goal was to generate a USR including just a minimal 
set of variables—IEP status, usage of ASR in classes, and course grades. 
The ASR provider submitted samples of data they routinely collected and 
based on those samples researchers were able to draft a technical specifica¬ 
tion for the work that would need to be done by the online school, the LMS, 
and the ASR platforms to produce a USR for analysis (see Appendix B for a 
list of measures considered for inclusion in the Phase II USR). 

Two collaborative and sequenced efforts were thus made to produce a 
USR to assess the impact of LMS usage generally (Phase I) and usage of 
a specific form of support (Phase II) on student outcomes. This approach 
to data analysis was anticipated to yield correlations that could help estab¬ 
lish quantitative insight into the recruitment, enrollment, persistence, prog¬ 
ress, and performance of students participating in this supplemental online 
learning program (Thompson, Diamond, McWilliam, Snyder, & Snyder, 
2005). While major barriers ultimately prevented extraction of usable data, 
researchers concluded that detailing the results of this effort and the barriers 
encountered would benefit future inquiries and point to strategies for mak¬ 
ing these and similar data usable. 

The sections that follow highlight the components of each of the Phase I 
and Phase II research efforts, the evidence basis for the research approach, 
and a detailed analysis of the barriers encountered. 

PHASE I DETAIL: RETROSPECTIVE UNIFICATION OF STUDENT DATA 

The intent of the initial systems review was to compile and analyze 
quantiative data collected by both the online school and the LMS on the 
recruitment, enrollment, retention, progress, and performance of SWD and 
to contextualize these data with descriptive information from interviews 
with staff. As a first step, a diagram summarizing the primary information 
systems where student and other data are stored was compiled (Figure 1). 
Following that, researchers set out to extract a minimal USR from historical 
data. Included in this plan were student data known from the interviews to 
be available. Specifically, an attempt was made to compile a USR using the 
following types of data elements: (a) demographic: student ID, IEP status 
(Yes/No); (b) usage: frequency of login, time spent on platform, optional 
features used, pages visited, and (c) learning outcomes: assignment scores, 
end-of-course grades, course completion (Yes/No). 
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Phase I Results 

As had become clear from prior COLSD research and that of others, as¬ 
sessing the impact of digital curriculum content and digital delivery systems 
required attending to the three primary components of any online learning 
system: the learner, the system design, and the context of its implementa¬ 
tion (Hamilton, Halverson, Jackson, Mandinach, Supovitz, Wayman, Pick¬ 
ens, Martin, & Steele, 2009; Miller, Soh, Samal, Kupzyk, & Nugent, 2015). 
By triangulating information about the student, his or her use of the online 
system, and any associated academic achievement outcomes, researchers 
hoped to identify patterns and relationships that could lead to more focused 
efficacy studies. This would enable the disaggregation of different student 
subgroups (in this instance, SWD and their peers) and would provide not 
just evaluative information on the system’s current efficacy but formative 
information as well that would inform strategies to increase its efficacy. 

Phase I revealed that, given how the system was configured, the data 
sought could not be readily collected and integrated into a unified record for 
analysis even though these data were being used operationally in the system 
either by machines or by staff and even though the school and the developer 
provided a large amount of raw historical data. A number of barriers were 
identified. 

Barrier 1.1: Key data not reliably structured 

Demographic data (such as a student’s IEP status, gender, and age) were 
used by staff to support individual students but were not all recorded reli¬ 
ably in a form that could be extracted readily. From an operational stand¬ 
point, for example, important information about students includes (a) do 
they have an IEP, and (b) if yes, what are the recommended accommoda¬ 
tions for that individual? Interview findings indicated that when an IEP is 
sent to the online school, it typically describes the necessary accommoda¬ 
tions for a student. Usual protocol directs that a contact person within the 
school ensures that the IEP requirements are forwarded on to the student’s 
online teacher(s), but since the school had no need for this information af¬ 
ter the semester ended, they did not record it in the formal online student 
record. Therefore, the IEP information was not stored anywhere in the stu¬ 
dent information system. They did keep informal tallies of which classes 
students with lEPs enrolled in, as well as for other purposes. 

Consequently, while the school was able to provide percentages of stu¬ 
dents with IEPs who were enrolled by simply counting the number of en¬ 
tries in the informal class list, no IEP flag could be reliably associated with 
each student in the data record for further analysis. This is an example of 
a situation where key demographic data were dissociated from the student 
usage and performance data (Figure 2). As a result, an analysis of the persis¬ 
tence, progress, or performance of SWD beyond enrollment counts was not 
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feasible within the parameters of this study. Further, event information such 
as course drop-out/completion rates for SWD were only available via manu¬ 
al comparison of drop-out/completion and ad hoc IEP lists. This meant that 
relevant data (IEP status) recorded for one purpose (supporting accommoda¬ 
tions for SWD) was not usable for analyzing patterns of student registra¬ 
tion, enrollment, progress, completion, and/or performance since the design 
of the tracking system did not anticipate multiple or comparative use cases. 


LEA/School 

- IEP 

- yiddes 

- Demographics 

L 



Course Features 

Course Materials 
Discussion Boards 
Assignments 
Dropbox 
Quizzes 

Time Mgmt Tools: 

- Calendar 

- Checklists 

- Content Tracking 
Surveys 


Figure 2. The disconnect between demographic data and online system 
usage information 


Barrier 1.2: Data collected in structured formats was not interpretable. 

While historical data were available for review, typical reports generated 
by the school focused on operational and instructional procedures not suit¬ 
able for research on individual student activities. In most cases, the data sets 
acquired through the provided reporting functions represented only a frac¬ 
tion of the data generated by a student in an online course and were not suf¬ 
ficiently granular for establishing relationships that could support research 
or teacher decision-making on progress, performance, usage, needed sup¬ 
ports, and other important questions about individual students. 

Upon request, some of these more granular data were provided by the 
LMS partner for analysis as an extraction of raw data from their database. 
The database is complex, with student data distributed across many tables 
of information that all reference one another. To be usable for research asso¬ 
ciated with individual student activities, the database structure would have 
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to be reconstructed from its parts. Once reconstructed, specific data would 
have to be extracted and flattened out into a two-dimensional array like a 
spreadsheet for analysis. During this process the data tables were revealed 
to have cryptic names like “CLASSACCESSES”, and data fields within 
tables evidenced names like “OrgUnitld.” 

Under these circumstances it is not generally possible to make sense of 
large data sets in the absence of a data dictionary that explains what each 
field and each table represent. Researchers were provided with a data dic¬ 
tionary for some of the data sets, and from that were able to determine that 
there did not appear to be much in the way of useful data regarding stu¬ 
dent activity in this particular part of the system. Additional analysis efforts 
would have been expensive and labor intensive for both the school and the 
LMS vendor to provide, and given the quality of the data it was decided not 
to pursue it further in this project. Consequently, researchers were unable to 
accurately interpret student activity data for which clearly associated defini¬ 
tions (i.e., a “data dictionary”) were not available. As a result, even though 
data are collected and stored they may be unsuitable for research purposes 
without significant additional investment in data extraction and preparation 
in close collaboration between brick-and-mortar schools, online schools, 
LMS providers, and researchers. 

Barrier 1.3: Some data are proprietary 

Online schools and LMS providers invest time and money in creating 
and refining their online technology platforms—platform design is part of 
their valuable intellectual property (IP) that they sometimes need to protect 
in order to preserve their ability to continue operating in a competitive en¬ 
vironment. In some cases, sharing information, such as a data dictionary for 
their core database system, creates a risk of exposing their core IP, which, 
in turn, may produce a risk of undermining their sustainable business. Al¬ 
though the online school and the LMS developer were very open and col¬ 
laborative in this project, there were elements they were not comfortable 
sharing. Consequently, just because data are collected and stored in a struc¬ 
tured system in a form that might be usable for analysis does not mean they 
can be shared with others without substantial cost and/or risk to one or more 
stakeholders. 

Barrier 1.4: Data may be expensive to extract with unknown research 
value. 

Some student progress data presented to administrators and teachers via 
online “dashboards” might be unusable for research purposes. For example, 
administrators in the online school had access to student activity data (infor¬ 
mation on when students logged into a course, for example) but these data 
were only available in a form that was not immediately extractable for re¬ 
search. For example, login information for a single student or a single class 
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may be displayed, but if there is no option to download or extract these data 
in bulk, manual collation is labor intensive, is beyond the resources avail¬ 
able for this project, and may be prone to human error. Given current and 
common configurations, a manual data extraction process would have to be 
repeated for every student or every course, depending on what options are 
provided through the interface. In this case it can be seen that even though 
data are collected and stored in a structured system in a format that might 
be usable for research, and even though they are being shared with teachers 
and administrators for instructional purposes, that still does not mean they 
can be made available to others for different purposes, such as research, 
without substantial effort and cost. 

While each of the enumerated challenges can be prohibitive individu¬ 
ally, collectively they can be insurmountable within the parameters of many 
research projects unless these issues are considered and planned for at the 
outset. 

The Impact of the Phase I Barriers 

The barriers discussed above interacted in complex ways and presented 
both technical and operational challenges. An overarching principle that 
emerged from Phase 1 (the first attempt to develop a USR from historical 
data) is that current online learning systems are not necessarily designed to 
support this kind of research on student progress, persistence, and perfor¬ 
mance—whether they actually contain potentially usable data or not. Unless 
data tracking systems are designed to accommodate interoperable student 
data comparisons it is not feasible to assume that existing data sets can be 
used to gain any true measure of educational efficacy. 

PHASE II: PROSPECTIVE ANALYSIS OF STUDENT LEARNING SUPPORTS 

In light of the barriers encountered in Phase I, Phase II sought to identify 
the minimal set of technical changes that would be required to compile us¬ 
able research data on the persistence, progress, performance, and malleable 
factors for SWD in an online setting. In this Phase, a fourth collaborator 
was added—a third-party provider of text-to-speech functionality, referred 
to as audio-supported reading or ASR. The ASR module connects to NE2’s 
LMS platform and is used in some of NEl’s online courses. ASR use was 
chosen to be monitored because many SWD have challenges with reading 
text-based media, online environments are often text-heavy, and text-to- 
speech is frequently an effective accommodation for SWD who struggle in 
reading (Izzo, Yurick, & McArrell, 2009; Meyer & Bouck, 2014). In Phase 
II, researchers sought to compile similar data as in the previous effort but 
specifically focused on students’ use of ASR functionality in coursework. 
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From the full list of measures considered for this phase (Appendix B), the 
minimal set of measures in each category sought for the initial analysis are 
as follows: 

• Demographic data 

° Unique Student Identifier 

° IEP Flag (Yes/No variable indicating if the student has an IEP) 

° Age, grade level, gender, ESS/ELL, school zip code 

• Usage data: Text to speech 

° Course ID, Page ID 

° Student action (start audio, stop audio, pause audio) 

° Selected text (or “all”) 

° Time start and end for playing audio 

• Student performance data 

° Assignment scores 

° Final grades 

° Participation scores (if available) 

In Phase II the minimum goal was to collect data in each of the three 
categories (demographic, usage, and achievement) as a proof of concept for 
an analysis relating student usage of this support by SWD and their peers in 
an online setting to any associated academic outcomes. Potential applica¬ 
tions of this type of analysis include informing teachers about student use 
of supports, performance, and work completion to guide instruction; and 
providing feedback on or knowledge of usage to inform students and help 
them to become more self-regulated learners (Steiner et al., 2015). COLSD 
researchers and their collaborators made promising progress on identifying 
a set of technical requirements supporting production of a USR. Through 
the process of working to implement this specification, researchers identi¬ 
fied a number of new data-related barriers that need to be overcome to carry 
out this kind of research. These barriers included those described as follows. 

Barrier 2.1: Key features used but not tracked. 

Students may use features such as ASR within online courses, but no 
student-level data on this use were recorded (e.g., how much text was read, 
which text was selected for read-aloud). Operationally, ASR support is pro¬ 
vided at the course level and the third-party ASR provider doesn’t need to 
store such data to provide the service nor to identify ASR use at an individ¬ 
ual student level. Consequently, the existence of learning supports in an on¬ 
line environment does not imply that data regarding student usage of those 
supports are being recorded for future analysis. 
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Upon investigation of the data flow, it was determined that a unique stu¬ 
dent identifier (generally a coded number) would have to be passed through 
each part of the system to make it possible to unify the student data col¬ 
lected in each part for analysis. Passing such an identifier from the school 
through the LMS to the ASR module (a necessary requirement for matching 
student to ASR use), however, was found to require considerable technical 
work on the core LMS platform. This would have required an investment of 
time from senior LMS technical staff who were engaged with higher prior¬ 
ity projects, and the work could have created a significant operational risk 
to the core platform. The LMS partner considered these costs and risks too 
substantial to justify adding the necessary functionality within the param¬ 
eters of this project. 

Barrier 2.2: Limited ASR access. 

It was further discovered that ASR functionality was only available in 
limited areas of the school’s courses, due at least in part to the way third- 
party functionality (e.g., within wikis, discussion boards, and ASR support) 
integrates into the core LMS. The ASR was made available only to read 
primary texts in the coursework and was not available to use in other ar¬ 
eas such as wikis and discussion boards where students communicated with 
teachers and each other and where it might have supported writing as well 
as reading—despite the fact that the wikis and discussion board activities 
were perceived as specifically relevant to the school’s cohort-based instruc¬ 
tional model which was predicated on a critical mass of students interested 
in the same topic all learning together. Assignments and activities were pri¬ 
marily completed as an online group with teachers trained to facilitate text- 
based peer-to-peer interactions. The limited ASR availability severely lim¬ 
ited the availability of ASR usage data. 

Barrier 2.3: Limitations imposed by privacy concerns. 

When the online school expressed interest in collecting data from stu¬ 
dents more systematically to better identify factors associated with the 
achievement of SWD, there was resistance from school superintendents 
whose districts contracted with the online school. Concerns and uncertain¬ 
ties related to student data privacy and compliance with federal and state 
data privacy laws were raised. Researchers believe that the challenge of 
generating a USR for analysis and efficacy investigations is a critical pre¬ 
requisite for conducting ecologically valid research on the recruitment, 
enrollment, persistence, progress, and performance of all students in real- 
world, scaled online environments. In the majority of circumstances, re¬ 
search initiated on behalf of the student’s “home” school is an allowable 
use of otherwise private information (34 CFR 99.31(a)(6)). In the case of an 
online school offering supplemental courses to many school districts nation¬ 
wide, obtaining multiple permissions and agreements proved unworkable 
within the parameters of this research project. 
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DISCUSSION 

Neither of the two phases yielded COLSD researchers usable data as 
hoped for, yet this inquiry provided significant insight into the practical 
challenges related to data identification, acquisition, and analysis that are 
faced by stakeholders seeking to assess the impact of online learning envi¬ 
ronments on student outcomes. In particular, these efforts revealed hidden 
complexities related to the management and use of student data and the fact 
that these complexities may not be widely recognized by students, parents, 
instructors, policy makers, researchers, and even online providers them¬ 
selves. 

Lessons Learned 

Much of the data generated by K-12 online learning systems to date may 
not be configured, coded, or defined in a manner that supports research ini¬ 
tiatives investigating individual student outcomes. It is possible to deliver 
most online services and supplemental supports (e.g., ASR) without record¬ 
ing any distinguishing information about the students (e.g., gender, age, 
ability) who use them. In fact, in many cases the provider of a particular ser¬ 
vice (like ASR) may not have any information about an individual student 
who is using the service at any given time. Without a student-level identifier 
that can be attached to each piece of a student’s data allowing the data ele¬ 
ments to be unified, data are virtually useless for the purpose of investigat¬ 
ing learning activity, learning outcomes, or malleable factors, including dif¬ 
ferential behaviors and outcomes for different populations such as students 
with and without disabilities. To be clear, the problem is not simply that it 
is expensive or difficult to pull data together for analysis, but that without 
some way to associate a student-level identifier to individual data points it 
is impossible to conduct a post hoc analysis of tool usage by students. Some 
of the technical requirements to ameliorate this issue seem relatively minor, 
and the benefits could be substantial—rendering a great deal of operation¬ 
al data being generated by online providers much more useful for research 
purposes. Therefore, it seems a concerted effort should be organized to ad¬ 
dress it as soon as possible. 

A review of the barriers and challenges encountered during this research 
effort also revealed that valuable and important data are being orphaned 
by the complex interaction of technical, legal, policy, and economic issues 
that arise between organizations. For example, if a school is the legal owner 
of a student’s activity data that an online developer collects but the online 
developer does not provide a means for the school to extract those data in 
bulk then those data become unavailable to use for comparison purposes. 
Under current practices the LMS provider may not have the right to extract 
the data or provide them to a researcher for analysis, and the school does 
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not have the means to extract the data for themselves or to provide them to 
a researcher. To enable viable research that could benefit all stakeholders, 
this capability must be pro-actively planned for and cooperatively built into 
the system. Benefits of this approach need to be made clear to LMS provid¬ 
ers to justify the investment in the development of accessible and functional 
data sharing capabilities enabling evaluation and ongoing research. 

Implications for the field 

Given the distributed nature of many of the challenges identified in this 
research project, collaboration is essential. No single organization can re¬ 
solve issues related to student data aggregation. At the very least, online 
schools must interact with their brick-and-mortar school counterparts and/ 
or researchers from other organizations. In circumstances like those encoun¬ 
tered in these inquiries, the systems involve many different providers each 
with their own platforms, technologies, and information systems, none of 
which have been designed to share student-level data. The field as a whole 
must develop sustainable models of collaboration that accommodate the 
economic, technological, legal, and other constraints and needs of the di¬ 
verse participating organizations be they commercial, educational, academ¬ 
ic, non-profit, governmental, or other. 

Online learning platforms and online courses must be designed up front 
to support research. It is clear that there is no easy retrofit when it comes to 
using existing online data for learning analytics and research. Online plat¬ 
forms are generating huge volumes of data, but most of it is not only go¬ 
ing unused—much of it, in its present state, is likely unusable for research 
purposes, which hinders its potential for assisting all stakeholders to support 
and improve online education for all students. This is not the sole responsi¬ 
bility of the LMS providers. Incentives, policies, and other supports must be 
put into place to enable them to take this issue seriously as a priority. When 
a student (including one with disabilities) moves online, there is no way to 
track persistence, performance, and progress or to identify malleable suc¬ 
cess factors. This is a serious problem. It would be beneficial to all learners 
to make these environments more research-friendly, but for SWD in particu¬ 
lar it seems quite urgent. 

Learning data interoperability standards are desperately needed in educa¬ 
tion (IMS Global, n.d.-a; Jakimoski, 2016; Maylahn, n.d.). Learning Tools 
Interoperability (LTI) and other similar standards already exist for the tech¬ 
nical integration of learning systems (IMS Global, n.d.-b). These need to be 
further enhanced to support data interoperability and data quality, includ¬ 
ing provisions for the requirements of research. For example, as described 
previously, online platforms can provide a wide range of services without 
recording student-level data and without attaching a unique ID that can be 
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used to unify the data for later analysis. Technical support for coordinating a 
unique student ID as well as guidelines for providers on best practices relat¬ 
ed to what data to store and what additional infonnation to document, such 
as data dictionaries and naming conventions for database fields to make 
them more interpretable, should be added to the framework. IMS Global’s 
Caliper framework (IMS Global, n.d.-a) is one example of a promising ap¬ 
proach. Work on developing and disseminating this kind of framework 
should be supported by the field at large since this is a matter of shared con¬ 
cern with substantial benefits for all stakeholders. 

Legal guidelines and standard data-sharing agreements regarding pri¬ 
vacy, data ownership, and usage need to be more clear and more read¬ 
ily understandable. The Creative Commons Share Alike (CCSA) licenses, 
for example, provide a model for this kind of collaborative resource (see 
https://creativecommons.org/share-your-work/). CCSA licenses are standard 
licenses that were developed centrally and are made freely available to any¬ 
one who wants to use them. This approach dramatically reduces the overall 
cost of solving this problem compared to having every provider create their 
own one-off licenses—thereby lowering the barriers to entry and expanding 
participation. It provides a common framework and shared understanding 
that facilitates communication and decision making by the many provid¬ 
ers and consumers of Creative Commons materials. A USR that allows for 
the association of demographic, usage, and achievement data with an indi¬ 
vidual student is a necessary requirement for realizing the full potential of 
networked learning environments. It can facilitate monitoring student prog¬ 
ress, adapting instruction for diverse learners, conducting research on what 
is working more or less well for which students and under what conditions, 
testing design assumptions, and identifying ways to continuously improve 
the system. These benefits would be important for all learners, but especial¬ 
ly those at the margins (such as SWD) who often fare least well and require 
the most adaptation and support to learn successfully. A centralized effort, 
analogous to the Creative Commons, to create and maintain the technical 
specifications, easy to understand legal and technical guidelines for each 
participating organization (e.g., brick and mortar schools, districts, online 
schools, online platform providers, third party tool providers, researchers), 
standardized legal agreements, model memoranda of understanding for each 
participant, and other materials necessary to produce USRs for research, 
could have a very profound positive impact on the pace, scope, quality, and 
cost of research and development in online education. 
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CONCLUSION 

Triangulation is a fundamental orienting principle of surveys and naviga¬ 
tion. It provides the location of an unknown point by creating intersecting 
lines from three additional, known points. The three major categories of ex¬ 
isting student data sets common to most online learning environments—de¬ 
mographics, system usage, and achievement—offer the potential to triangu¬ 
late these data, and, in sufficient quantity, to yield correlations that point to 
factors associated with greater than or less than expected academic achieve¬ 
ment. These correlations, can, in turn, help to narrow the focus of subse¬ 
quent efficacy studies. Knowing what works for which types of students un¬ 
der what circumstances is a core consideration of any instructional interven¬ 
tion, and this information could help curriculum designers and developers, 
school-based decision makers, parents, and students alike. 

This research initiative from COLSD is felt to be a fairly representative 
depiction of the existing complexities—technological, policy-based, and 
legal—that can be encountered when attempting to combine student data 
from the three categories referenced above to gain a clearer picture of what 
is working and what is not. This challenge should be of particular concern 
for those crafting, implementing, and accounting for education services (in¬ 
cluding special education) for elementary and secondary SWD. For most 
struggling students engaged in full-time virtual schooling, direct, face-to- 
face monitoring of the type, duration, extent, and impact of support services 
is simply not available. Consequently, access to the data that includes these 
factors may not only be preferred but essential for both accountability and 
instructional purposes. In blended settings, where students may spend 40% 
to 60% of their curricular interactions online, similar access to meaning¬ 
ful data is critical. For students taking more limited supplemental online 
coursework, these data can be combined with direct observation and the 
face-to-face progress monitoring, observation, and communication afforded 
by the traditional approach to support service delivery that occurs in brick- 
and-mortar settings. 

None of the challenges researchers encountered were felt to be insur¬ 
mountable, yet each, in turn, was significant. In addition, each of the tech¬ 
nological, policy-based, and legal issues requires the attention and expertise 
of a range of stakeholders: developers of digital curriculum and delivery 
systems, educators, technology standards experts, policy makers, research¬ 
ers, and others. This indicates that an optimum approach to rectifying the 
current data discontinuity will require some consensus building among and 
across all of these groups. 
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APPENDIX A 

DESIRED DATA ELEMENTS CONSIDERED FOR INCLUSION IN A UNIFIED 
STUDENT RECORD (USR) FOR PHASE I 

(Note: This list excludes specific usage measures, such as login events and 
learning features utilized.) 

• Student demographic information 

° Unique anonymous student ID number 

0 Birthdate 

° Name and zip code of brick-and-mortar school they attend (if 
appropriate) 

0 IEP status (yes or no)—If yes, disability category/type (choose 
from a list) 

° 504 plan (yes or no) 

° Grade level 

0 Race/Ethnicity 

° Gender 

° Language status 

• Gross usage of the online platform 

° Number of online courses the student has taken prior to current 
year 

° Number of online courses the student passed, failed, or dropped/ 
withdrew from prior to current year 

0 Number and names of online courses student was enrolled in dur¬ 
ing the previous academic year 

° Number and names of online courses students dropped/withdrew 
from during the previous academic year 

• Student achievement 

° Individual online assignment grades during the previous academic 
year 

° Student’s online course quiz scores during the previous academic 
year 

0 Student’s online final course grades for the previous academic year 
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APPENDIX B 

MEASURES PROPOSED FOR CONSIDERATION IN PHASE II 

• Student information (demographic and other) 

° Unique anonymous student ID number (issued by the online school) 
° Student age 
° Student grade level 
° Gender 

° Zip code of home or school (proxy for SES) 

° IEP/504 status (binary flag: Yes/No) 

° Other courses student is taking simultaneously with the online 
course(s) 

° Prior enrollment in online courses, if any 
° Race/Ethnicity 
° Free/reduced-price lunch status 

° Is English the primary language spoken at the student’s home? 

° Does student participate in an ESL/ELL (English as a Second Lan¬ 
guage/English Language Learners) program at their school? 

• Course information 

° Course name 

° Course category information: Level, content, credit recovery, AP, 
etc. 

• Student achievement data 

° Assignment scores 

° End-of-term grades for the course of interest 
° Overall course grades for the course of interest 
° Participation grade if it exists 

° Grades from other courses taken with the online course provider 

• Data dictionaries for all available data sets 

• Content usage (LMS events) 

° Student ID (if different from unique anonymous student ID number) 
° Online content page ID 
° Online content page section ID (if available) 

° Page load time stamp 

° Page exit time stamp, if available, or next page load time stamp 
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APPENDIX B, Continued 

• Learning activities (for each technology component of interest, such as 
ASR) 

° Student ID that matches the online course student ID 
° Course ID 
° Course category 
° Page ID/URL 

° Page section identifier (if available) 

° Event ID (events triggered when student takes an action) 

° Event data and definition (text selected, text entered, options cho¬ 
sen, etc.) 

° Event time stamp where applicable—start/stop (or start and 
duration) 


